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(54) Document type definition generating method and apparatus 



(57) There is disclosed a document type definition 
generating method comprising, in a structured docu- 
ment provided with a tag having an element name in 
each document element, judging a physical structure of 
each document element from indention, blank lines, and 
positional relation between tags, analyzing words and 
phrases in each document element, and judging a se- 
mantic structure of the document element based on 
words and phrases connection and word types. When 
the physical and semantic structures of document ele- 
ments having tags different in element name are similar, 
the elements are regarded as being of the same type 
and one element name is excluded from a list for gen- 
erating the document type definition. When the physical 
and semantic structures of document elements having 
tags with the same element name are different, the el- 
ements are regarded as being of the different types and 
one element name is changed. Furthermore, the words 
and phrases between a start tag and an end tag with the 
same title are analyzed, and the information to be in- 
cluded between the tags is obtained to generate the 
document type definition. Thereby, tag meaning is cor- 
rectly treated, and the document type definition with tag 
r dundancy remov d therefrom is generated. 
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D scripti n 

BACKGROUND OF THE INVENTION 

Field of the Invention $ 

[0001] The present invention relates to a computer- 
ized document processing executed by a personal com- 
puter, a word processor, and the like, particularly to a 
method and apparatus for generating the document 10 
type definition of a structured document, and a storage 
medium in which a program is stored. 

Related Background Art 

75 

[0002] In recent years, the computerized documents 
prepared by a personal computer, a word processor, 
and the like have widely been used. The introduction of 
a structured document is advanced in which the com- 
puterized document is consistently treated and the ele- 20 
ments constituting the document are provided with se- 
mantic information. In this structured document, each 
document element is held between front and back tags 
including element names (tag names), and in many cas- 
es description is performed for each document type in 2s 
accordance with the document type definition of defining 
a place, order, frequency and the like in which the ele- 
ment appears. 

[0003] On the other hand, the structured document 
can be described without preparing the document type 
definition. However, when the documents prepared by 
a plurality of users are integrated to form one document, 
and if the individual users use the tags having arbitrary 
titles, there is a possibility of attaching different tag 
names to the same element, or conversely attaching the 
same tag name to different elements. 
[0004] In this case, there arise problems that the se- 
mantic information attached to the tag cannot correctly 
be handled, and that redundancy is generated with re- 
spect to the tag. 

SUMMARY OF THE INVENTION 

[0005] An objective of the present invention is to pro- 
vide a method and apparatus for generating document 
type definition from a structured document provided with 
tags, and a storage medium which stores the program. 
[0006] Another objective of the present invention is to 
provide a document type definition generating method 
and apparatus which can correctly treat semantic infor- 
mation given to tags, and a storage medium which 
stores the program. 

[0007] Further objectiv of the present invention is to 
provide a document typ d finition g nerating method 
and apparatus which can g nerate docum nt type def- 
inition with redundancy to tags removed th r from, and 
a storage medium which stores the program. 
[0008] According to one aspect, the present invention 



which achieves these objectives relates to a document 
processing method comprising: in a structured docu- 
ment provided with a tag having an element name in 
each document element, a physical structure judging 
step of judging a physical structure of each document 
element; a semantic structure judging step of judging a 
semantic structure of the document element; and a doc- 
ument type definition generating step of generating doc- 
ument type definition to define appearance state of the 
document element in the structured document based on 
judgment results of the physical structure judging step 
and the semantic structure judging step. 
[0009] According to another aspect, the present in- 
vention which achieves these objectives relates to a 
document processing apparatus comprising: in a struc- 
tured document provided with a tag having an element 
name in each document element, physical structure 
judging means for judging a physical structure of each 
document element; semantic structure judging means 
for judging a semantic structure of the document ele- 
ment; and document type definition generating means 
for generating document type definition to define ap- 
pearance state of the document element in the struc- 
tured document based on judgment results of the phys- 
ical structure judging means and the semantic structure 
judging means. 

[0010] According to still another aspect, the present 
invention which achieves these objectives relates to a 
computer-readable storage medium storing a document 
type definition generating program for controlling a com- 
puter to perform document type definition generation, 
the program comprising codes for causing the computer 
to perform, in a structured document provided with a tag 
having an element name in each document element, a 
physical structure judging step of judging a physical 
structure of each document element, a semantic struc- 
ture judging step of judging a semantic structure of the 
document element, and a document type definition gen- 
erating step of generating document type definition to 
define appearance state of the document element in the 
structured document based on judgment results of the 
physical structure judging step and the semantic struc- 
ture judging step. 

[0011] Other objectives and advantages besides 
those discussed above shall be apparent to those skilled 
in the art from the description of a preferred embodiment 
of the invention which follows. In the description, refer- 
ence is made to accompanying drawings, which form a 
part thereof, and which illustrate an example of the in- 
vention. Such example, however, is not exhaustive of 
th various embodiments of the inv ntion.andth r fore 
reference is made to the claims which follow the descrip- 
tion for d tenmining the scope of the invention. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0012] Fig. 1 is a block diagram of a document type 
definition generating apparatus. 
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[0013] Fig. 2 is a flowchart showing the procedure of 
a document type definition generation processing. 
[0014] Figs. 3A and 3B are diagrams showing exam- 
ples of structured document data. 
[0015] Fig. 4 is a flowchart showing the processing 
procedure of physical structure analysis. 
[0016] Fig. 5 is a flowchart showing the processing 
procedure of semantic structure analysis. 
[0017] Fig. 6 is a flowchart showing the processing 
procedure of removing tag redundancy. 
[0018] Fig. 7 is a diagram showing one example of 
document type definition. 

DETAILED DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

[0019] A preferred embodiment of the present inven- 
tion will be described hereinafter with reference to the 
accompanying drawings. 

<First Embodiment 

[0020] Fig. 1 is a block diagram of a document type 
definition generating apparatus according to the present 
invention. 

[0021] In Fig. 1, an input unit 101 is constituted of a 
keyboard, a pointing apparatus, and the like, and is used 
for a user to input data or commands. An external mem- 
ory unit 102 is constituted of a storage apparatus using 
media such as a hard disk to store structured document 
data as a processing object, data of semantic informa- 
tion database (DB) described later, generated docu- 
ment type definition, and the like. A display unit 103 is 
constituted of CRT, a liquid crystal display, and the like 
to display the structured document data, the generated 
document type definition, and the like. 
[0022] A CPU 104 performs control of each compo- 
nent of the apparatus, reads and executes a program, 
and realizes various processings. A ROM 105 stores 
fixed data and program. A control program for realizing 
a processing procedure as described later with refer- 
ence to the flowcharts of Fig. 2 to 6 may be stored in the 
ROM 105, or read from the external memory unit 102. 
A RAM 106 presents an operation area necessary for 
the processing of the apparatus. A bus 107 connects 
the apparatus components. 

[0023] Fig. 2 is a flowchart showing the procedure of 
a document type definition generation processing ac- 
cording to the present invention. 
[0024] First, the structured document is inputted in 
step S201 . This is executed by reading th structured 
document from the external m mory unit 102. One x- 
ample of the structured document given her in is shown 
in Fig. 3A. For xample, a first lin n <Titl >" indicat s a 
start tag, °</Title> n indicates an end tag, and "TV SET 
OPERATING INSTRUCTIONS" h Id between these 
tags is a document element indicating a tag content. 
Moreov r, 'Title" is an element name (tag name). Fur- 



thermore, the attribute and value of the element can be 
described in the tag. 

[0025] In the next step S202, each tag position is de- 
tected from the structured document, and a tag number 
s is attached in order from the top °<Title>". 

[0026] Subsequently, in step S203, the physical struc- 
ture in the document is detected. For example, in Fig. 
3B, as diagrammatically represented in "<Para> n indi- 
cating a paragraph, a feature that a sentence group 
10 starting with an indention is regarded as the paragraph 
is detected. The processing procedure for detecting 
such physical structure is shown in the flowchart of Fig. 
4. 

[0027] First, in step S401 , a line in which indention is 
performed is found in the document, and in the next step 
S402 the sentence group following the line is detected. 
In this case, the line in which the indention is performed 
to the line in which the next indention is performed, or 
to the line right before a blank line can be set to the sen- 
tence group. In this case, the indention (double inden- 
tion) performed in quotation in which the quotation is 
represented by performing the indention, and blank 
lines described by constantly skipping one or more lines 
are excluded as structures meaningless for the detec- 
tion of the physical structure from the entire document 
pattern to perform the processing in the step S402. 
[0028] Turning back to Fig. 2, in the next step S204, 
the semantic structure of the inputted structured docu- 
ment is detected. As one example, in Fig. 3A, the con- 
tents of tags "<Section>" have forms in which "t. D , "2.", 
"3." are attached to top positions. Here, the content of 
tag "<Section>" can semantically be presumed to have 
"numeral." on its top. One example of processing pro- 
cedure for detecting the semantic structure is shown in 
the flowchart of Fig. 5. 

[0029] First, in step S501, communication is per- 
formed with a semantic information database (DB) 51 
with respect to all words and codes in the document to 
provide the connection between words in the document 
and the types of words and codes. In the next step S502, 
the semantic structure found in each document element 
is detected based on this result. 
[0030] Returning to Fig. 2, in the next step S205, a 
first appearing tag is regarded as the tag to be proc- 
essed, and it is judged in step S206 whether or not the 
processing of the tag is all completed. 
[0031] When the tag processing is not completed, the 
process shifts to step S207, in which the tag as the 
present processing object, and the information on the 
physical and semantic structures detected in the steps 
S203 and 204 are unified. H r , the unifying m ansthat 
when physical and s mantic features are present in the 
line related with the tag used as the pr sent processing 
obj ct, the tag and the information are connected. Sub- 
sequently, in step S208, the process is mov d to the 
n xt appearing tag, thereby returning to the step S206. 
[0032] On the other hand, when it is judged in the step 
S206 that the tag processing is all completed, the proc- 
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ess shifts to step S209, in which similarity is obtained 
between the tags having different titles. When the sim- 
ilarity is equal to or more than a predetermined threshold 
value, the tags are regarded as the same tag, and one 
of the tags is prevented Irom appearing on the document 5 
type definition to be generated. The processing proce- 
dure for obtaining this similarity to determine whether or 
not the tags have the same content is shown in the flow- 
chart of Fig. 6. 

[0033] First, the similarity of tags A, B having different 
titles is calculated in step S601 . This calculating method 
comprises setting the similarity of the physical structure 
to 1 when the physical structures agree with each other. 
When the physical structures do not completely agree 
with each other, but partially agree with each other, the 
similarity of the physical structure is set to a value less 
than 1 which corresponds to the agreed proportion. The 
similar concept is applied to the semantic structure, and 
the similarity of the semantic structure is obtained. The 
dividing of the sum of the similarity of the physical struc- 
ture and the similarity of the semantic structure by 2 re- 
sults in a general similarity d AB of A and B. 
[0034] In the next step S602, the similarity d AB ob- 
tained in the step S601 is compared with the predeter- 
mined threshold value 8. When the similarity d AB is less 
than 5, the process jumps to step S604 for trial of the 
next combination. 

[0035] When the similarity d AB is equal to or more than 
the threshold value 5, the process shifts to step S603, 
in which the tag B is regarded as being of the same type 
as the tag A, the tag B is finally struck off a list for gen- 
erating the document type definition, and redundancy is 
removed. 

[0036] When the processing of the step S603 is com- 
pleted, the process advances to step S604, in which it 
is judged whether or not the trial of combination of all 
tags is made. When the combination of all tags is not 
tried, the process returns to the step S601. When the 
combination of all tags is tried, the subroutine process- 
ing is ended to return to the main routine of Fig. 2. 
[0037] Moreover, in the step S209, in addition to the 
above-described processing of Fig. 6, the physical 
structure and semantic structure of the document ele- 
ments having the same title are compared. When the 
structures are different, the title of one of the tags is 
changed. For this purpose, the similarity is obtained be- 
tween tags Aa and Ab having the same tag name in the 
same manner as described above. When similarity val- 
ue d AaAb is less than the threshold value, the title of the 
tag Ab is changed. This threshold value may be differ nt 
from the above-described value. 
[0038] In step S210 of Fig. 2, the sent nee word be- 
tween the start tag and th end tag wh ich have the same 
title is analyz d to obtain th information to be includ d 
in the tags. This analysis result is us d to gen rate the 
document type definition in the n xt step S211 . 
[0039] Fig. 7 is a diagram showing one example of the 
generated document type definition, and the document 
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type definition generated from the structured document 
data shown in Fig. 3A is shown as document type "man- 
ual". 

[0040] Here, in Fig. 3A, the content of tag <Sect> 
agrees in physical structure with the content of tag 
<Section>, and the tags are the same in semantic struc- 
ture in that they have the form of "numeral.". Therefore, 
it is determined in the step S209that the tag <Sect> has 
the same content as that of the tag <Section>. As a re- 
sult, the generated document type definition does not 
use <Sect>, and in <Body>, Section+, that is, tag <Sec- 
tion> repeatedly appears. 

<Second Embodiment 

[0041] In the above-described first embodiment, the 
physical and semantic structures in the document are 
judged based on the sentence (portions other than 
tags), but the present invention is not limited to this. 
[0042] For example, the physical information such as 
the relative positional relation between the tags and the 
inclusive relation of the tags is detected as the physical 
structure, or the meaning represented by the tag name 
or attribute is detected as the semantic structure, so that 
these structures may be used as the objects to obtain 
the similarity. 

[0043] According to the embodiments described 
above, since the physical and semantic structures of the 
document element surrounded with the tags are judged, 
and the document type definition of the structured doc- 
ument provided with the tags is generated, the semantic 
information given to the tags can correctly be treated. 
[0044] Furthermore, the redundancy to the tags hav- 
ing the same content can be removed, and the docu- 
ment type definition can be generated in which there are 
no tags being the same in title and different in meaning. 
[0045] Additionally, the present invention may be ap- 
plied to a computer system constituted of a plurality of 
apparatuses (e.g., host computer, interface apparatus, 
reader, printer, and the like), or to a device constituted 
of one apparatus (e.g., word processor, copying ma- 
chine, facsimile device, and the like). 
[0046] Moreover, it goes without saying that the ob- 
jective of the present invention can be achieved by sup- 
plying a storage medium storing the program code of 
software to realize the function of the above-described 
embodiment to the system or the device, and reading 
and executing the program code stored in the storage 
medium by the computer (or CPU or MPU) of the system 
or the d vice. 

[0047] In this case, th program code its If read from 
th storage medium realizes the function of the above- 
described embodiment, and the storag medium in 
which th program cod is r corded constitut s th 
pr s nt invention. 

[0048] As the storag medium in which the program 
code, and tables and other variable data are stored, for 
xample, a floppy disk (FD), a hard disk, an optical disk, 
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an optomagnetic disk, CD-ROM, CD-R, a magnetic 
tape, a nonvolatile memory card (IC memory card), 
ROM, and the like can be used. 
[0049] Moreover, the function of the above-described 
embodiment is realized by executing the program code 
read by the computer, but it goes without saying that the 
present invention also includes a case in which an op- 
erating system (OS) operating on the computer per- 
forms a part or the whole of an actual processing based 
on the instruction of the program code and the function 
of the above-described embodiment is realized by the 
processing. 

[0050] Although the present invention has been de- 
scribed in its preferred from with a certain degree of par- 
ticularity, many apparently widely different embodi- 
ments of the invention can be made without departing 
from the spirit and the scope thereof. It is to be under- 
stood that the invention is not limited to the specific em- 
bodiments thereof except as defined in the appended 
claims. 

[0051 ] Additional aspects and embodiments of the in- 
vention are envisaged in which the document type def- 
inition generation step is based on only a judging step 
of judging physical structure. There is disclosed an al- 
ternative system in which the document type definition 
generation step is based on only a judging step of judg- 
ing a semantic structure. 

[0052] Further, the computer program for implement- 
ing the invention can be obtained in electronic form for 
example by downloading the code over a network such 
as the Internet. Thus in accordance with another aspect 
of the present invention there is provided an electrical 
signal carrying processor implementable instructions for 
controlling a processor to carry out the method as here- 
inbefore described. 



Claims 



ture judging step comprises judging the physical 
structure of the document element based on an in- 
dention or a blank line. 

5 3. The document type definition generating method 
according to claim 2, wherein when the physical 
structure of the document element is judged based 
on said indention, the judging is performed by ex- 
cluding the indention which represents quotation. 

10 

4. The document type definition generating method 
according to claim 2, wherein when the physical 
structure of the document element is judged based 
on said blank line, the judging is performed by ex- 
eluding the blank line from a document in which de- 
scription is made by constantly placing every pre- 
determined number of blank lines. 

5. The document type definition generating method 
20 according to claim 1 , wherein said physical struc- 
ture judging step comprises judging the physical 
structure of the document element based on a po- 
sitional relation of the tags surrounding the docu- 
ment element. 

6. The document type definition generating method 
according to claim 1 , wherein said semantic struc- 
ture judging step comprises referring to a semantic 
information database to judge the semantic struc- 

30 ture of the document element based on words and 
phrases connection in a document and word types. 

7. The document type definition generating method 
according to claim 1 , wherein said semantic struc- 

3S ture judging step comprises judging the semantic 
structure of the document element based on a 
meaning represented by the tags surrounding the 
document element. 



1. A document type definition generating method, 40 
comprising, in a structured document provided with 

a tag having an element name in each document 
element: 

a physical structure judging step of judging a 
physical structure of each document element; 
a semantic structure judging step of judging a 
semantic structure of said each document ele- 
ment; and 

a document type definition gen rating step of so 
generating docum nt type definition to defin 
appearance state of the document element in 
said structured document based on judgm nt 
r suits of said physical structur judging step 
and said s mantic structure judging step. ss 

2. The document type definition generating method 
according to claim 1, wherein said physical struc- 



8. The document type definition generating method 
according to claim 1 , wherein said document type 
definition generating step comprises a redundancy 
removing step of, when the physical structure and 
the semantic structure of a plurality of document el- 
ements having the tags different in element name 
are similar, regarding the document elements as 
being of the same type and excluding one element 
name from a document type definition generating 
object based on the judgment results of said phys- 
ical structure judging st p and said semantic struc- 
ture judging step. 

9. The document type definition generating method 
according to claim 8, wh rein said r dundancy re- 
moving st p comprises obtaining similarity degr es 
concerning agreement degr es of the physical 
structure and the semantic structure betwe n the 
document elements having the tags different in el- 
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ement name, and regarding the document elements 
as being of the same type when a general similarity 
value calculated from the similarity degrees is equal 
to or more than a predetermined threshold value. 

5 

10. The document type definition generating method 
according to claim 1, wherein said document type 
definition generating step comprises a title chang- 
ing step of, when the physical structure and the se- 
mantic structure of a plurality of document elements io 
having the tags with the same element name are 
different, regarding the document elements as be- 
ing of different types and changing one element 
name based on the judgment results of said physi- 
cal structure judging step and said semantic struc- is 
ture judging step. 

11. The document type definition generating method 
according to claim 1, wherein said document type 
definition generating step comprises analyzing 20 
words and phrases present between a start tag and 

an end tag having the same title, obtaining informa- 
tion to be included between the tags, and generat- 
ing the document type definition based on the infor- 
mation. 25 

12. A document type definition generating apparatus 
comprising: in a structured document provided with 
a tag having an element name in each document 
element, 30 

physical structure judging means for judging a 
physical structure of said each document ele- 
ment; 

semantic structure judging means for judging a 35 
semantic structure of said each document ele- 
ment; and 

document type definition generating means for 
generating document type definition to define 
appearance state of the document element in 40 
said structured document based on judgment 
results ol said physical structure judging means 
and said semantic structure judging means. 

1 3. The document type definition generating apparatus 45 
according to claim 12, wherein said physical struc- 
ture judging means judges the physical structure of 

the document element based on an indention or a 
blank line. 

so 

1 4. The document type definition generating apparatus 
according to claim 13, wh rein said physical struc- 
ture judging means judg s the physical structure of 
th docum nt I m nt bas d on said indention by 
excluding the indention which repres nts quotation. 55 

1 5. Th document type definition generating apparatus 
according to claim 13, wh r in said physical struc- 



ture judging means judges th physical structure of 
the document element based on said blank lines by 
excluding the blank lines from a document in which 
description is made by constantly placing every pre- 
determined number of blank lines. 

16. The document type definition generating apparatus 
according to claim 12, wherein said physical struc- 
ture judging means judges the physical structure of 
the document element based on a positional rela- 
tion of the tags surrounding the document element. 

17. The document type definition generating apparatus 
according to claim 12 : wherein said semantic struc- 
ture judging means refers to a semantic information 
database to judge the semantic structure of the doc- 
ument element based on words and phrases con- 
nection in a document and word types. 

18. The document type definition generating apparatus 
according to claim 1 2, wherein said semantic struc- 
ture judging means judges the semantic structur 
of the document element based on a meaning rep- 
resented by the tags surrounding the document el- 
ement. 

19. The document type definition generating apparatus 
according to claim 12, wherein said document type 
definition generating means comprises redundancy 
removing means for, when the physical structure 
and the semantic structure of a plurality of docu- 
ment elements having the tags different in element 
name are similar, regarding the document elements 
as being of the same type and excluding one ele- 
ment name from a document type definition gener- 
ating object based on the judgment results of said 
physical structure judging means and said semantic 
structure judging means. 

20. The document type definition generating apparatus 
according to claim 19, wherein said redundancy re- 
moving means obtains similarity degrees concern- 
ing agreement degrees of the physical structure 
and the semantic structure between the document 
elements having the tags different in element name, 
and regards the document elements as being of the 
same type when a general similarity value calculat- 
ed from the similarity degrees is equal to or mor 
than a predetermined threshold value. 

21 . The document type definition generating apparatus 
according to claim 1 2, wh rein said document type 
definition gen rating means comprises title chang- 
ing means for, when the physical structure and the 
semantic structure of a plurality of docum nt ele- 
m nts having the tags with the sam el mentname 
are different, regarding the document lements as 
being of different types and changing one element 
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name based on the judgment results of said physi- 
cal structure judging means and said semantic 
structure judging means. 

22. The document type definition generating apparatus 
according to claim 12, wherein said document type 
definition generating means analyzes words and 
phrases present between a start tag and an end tag 
having the same title, obtains information to be in- 
cluded between the tags, and generates the docu- 
ment type definition based on the information. 

23. A computer-readable storage medium storing a 
document type definition generating program for 
controlling a computer to perform document type 
definition generation, said program comprising 
codes for causing the computer to perform: 



27. A document type definition generating method, 
comprising, in a structured document provided with 
a tag having an element name in each document 
element: 

5 

a semantic structure judging step of judging a 
semantic structure of said each document ele- 
ment; and 

a document type definition generating step of 
to generating document type definition to define 

appearance state of the document element in 
said structured document based on judgment 
results of said semantic structure judging step. 

*s 28. An electrical signal carrying processor implementa- 
ble instructions for controlling a processor to carry 
out the method of any one of claims 1 to 11 . 



in a structured document provided with a tag 
having an element name in each document el- 20 
ement, a physical structure judging step of 
judging a physical structure of each document 
element; 

a semantic structure judging step of judging a 
semantic structure of said each document ele- 2S 
ment; and 

a document type definition generating step of 
generating document type definition to define 
appearance state of the document element in 
said structured document based on judgment 30 
results of said physical structure judging step 
and said semantic structure judging step. 



24. A method of removing redundancy in tags associ- 
ated with elements of a document comprising com- 35 
paring tags to obtain a similarity value, comparing 

the similarity value with a threshold to detect redun- 
dancy, and cancelling one of the tags if redundancy 
is detected. 

40 

25. A method of removing redundancy in tags associ- 
ated with elements of a document comprising ana- 
lysing elements having the same tags and cancel- 
ling one of the tags if the elements are identical. 

45 

26. A document type definition generating method, 
comprising, in a structured document provided with 
a tag having an element name in each document 
element: 

so 

a physical structur judging step of judging a 
physical structure of each document element; 
a documenl typ d finition gen rating step of 
gen rating document type definition to defin 
appearance state of the document element in ss 
said structur d docum nt based on judgment 
results of said physical structure judging step. 
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FIG. 3A 



<Title>TV SET OPERATING INSTRUCTIONS< /Title > 
< Date > 1 9998.2.1 < /Date > 
<Author>TARO YAMADA</Author> 
<Body> 

< Section >1. PLUG IN</Section> 
<Section>2.TURN ON POWER < /Section > 
<Section>3.TUNE IN</Section> 
<Sect>4. CONTROL VOLUME < /Sect > 
</Body> 
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<!DOCTYPE manual [ 

< ELEMENT manual (Title, Date, Author, Body)> 

< ELEMENT Title (#PCDATA)> 

< ELEMENT Date (#PCDATA)> 

< ELEMENT Author (#PCDATA)> 

< ELEMENT Body (Section+)> 

< ELEMENT Section (#PCDATA)> 

]> 



